Band Division Noise Suppressor and Band Division Noise Suppressing Method

ABSTRACT

A band division noise suppressor suppressing noise sufficiently with a small amount of processing and a little voice distortion. In the band division noise suppressor, a band dividing section ( 101 ) divides an input voice signal into a low band voice signal and a high band voice signal. The low band voice signal is subjected to decimate at a decimation section ( 102 ), subjected to noise suppression at a low band noise suppressing section ( 103 ), and then interpolated at an interpolation section ( 104 ). On the other hand, the high band voice signal is subjected to noise suppression at a high band noise suppressing section ( 105 ). A band combination section ( 106 ) composes the bands of low-band and high-band voice signals subjected to noise suppression and outputs a voice signal subjected to noise suppression over the entire band.

TECHNICAL FIELD

The present invention relates to a band division noise suppressionapparatus and band division noise suppression method that dividesbackground noise into a high band component and low band component andsuppresses background noise, and more specifically, to a band divisionnoise suppression apparatus and band division noise suppression methodthat are suitable for use in mobile terminal apparatus.

BACKGROUND ART

Generally, a low bit rate speech coding apparatus can provide a highquality communication for speech including few background noise.However, for speech including background noise, abrasive distortion thatis unique to low bit rate coding occurs and speech quality deteriorationcan be caused. Noise suppression/speech emphasis technologies which areperformed to deal with the speech quality deterioration are classifiedinto processing technology in time domain and processing technology infrequency domain.

As a noise suppression/speech emphasis technology in time domain, forexample, the technology disclosed in Patent Document 1 is known. Thatis, Patent Document 1 discloses a technology that distinguishes betweena speech segment and a non-speech segment by changing a suppressionfactor determined by short segment power of an input speech signalaccording to estimated non-speech segment power, and thereby performsappropriate noise suppression.

Furthermore, as a noise suppression/speech emphasis technology infrequency domain, for example, the technology disclosed in PatentDocument 2 is known. That is, in Patent Document 2, band division isperformed on an input signal, the ratio of speech signal and noisesignal for the signal of each band is estimated, and noise is suppressedby multiplying a gain factor for noise suppression calculated based onthe ratio and the input signal of each band. Then, Patent Document 2discloses a technology that masks distortion caused at that time byadding a few pseudo background noise signals which are similar to anoise spectrum, according to the ratio of speech signal and noisesignal, and enables effective noise reduction with little distortion.This method distinguishes between band where speech is large (SN ratiois large) and band where noise is large (SN ratio is small), and addsappropriate pseudo background noise, and therefore musical noise issuppressed and speech quality is expected to improve when SN ratio issmall.

Furthermore, Patent Document 3 proposes a method for repairing a missingpitch harmonic power spectrum based on two kinds of comb filtersgenerated as extraction and repairing standards of a pitch harmonicpower spectrum. This method actively utilizes characteristics of aspeech signal (for example, speech pitch harmonic power spectrum), sothat it is possible to distinguish between speech band and noise bandwith high accuracy and, reduce speech distortion and remove noiseadequately.

-   Patent Document 1: Japanese Patent Publication No. 3437264-   Patent Document 2: Japanese Patent Publication No. 3309895-   Patent Document 3: Japanese Patent Application Laid-Open No.    2002-149200

DISCLOSURE OF INVENTION Problems to be Solved by the Invention

However, there are following problems in these conventionaltechnologies. That is, the noise suppression/speech emphasis technologyin time domain disclosed in Patent Document 1 only requires a simpleprocessing method and a small amount of calculation, but cannot performdetailed setting of a suppression factor for each frequency componentusing frequency characteristics of speech and noise. Therefore, there isa limitation in performance of noise suppression with little speechdistortion.

Furthermore, with the noise suppression/speech emphasis technology infrequency domain disclosed in Patent Document 2, part of speechinformation (SN ratio) is used, but speech signal characteristics (forexample, speech pitch harmonic power spectrum) are not actively used. Asa result, it is difficult to distinguish between speech band and noiseband with high accuracy, and therefore, it is considered difficult toreduce speech distortion and remove noise adequately.

Furthermore, the method for repairing a missing pitch harmonic powerspectrum disclosed in Patent Document 3 requires a long discrete Fouriertransform length to extract a pitch harmonic power spectrum accurately,and therefore the amount of calculation increases. This becomes aproblem for applying to noise suppression apparatus in mobile terminalapparatus.

It is therefore an object of the present invention to provide a banddivision noise suppression apparatus and band division noise suppressionmethod having little speech distortion and a large amount of noisesuppression with a small amount of processing.

Means for Solving the Problem

The band division noise suppression apparatus according to the presentinvention adopts a configuration having: a band division section thatperforms band division on an input speech signal into a low band speechsignal including a low frequency noise component and a high band speechsignal including a high frequency noise component; a decimationprocessing section that performs down-sampling on the low band speechsignal; a low band noise suppression section that suppresses noiseincluded in the low band speech signal subjected to the decimationprocessing; an interpolation processing section that performsup-sampling on the noise-suppressed low band speech signal; a high bandnoise suppression section that suppresses noise included in the highband speech signal; and a band combination section that combines the lowband speech signal subjected to the interpolation processing and thehigh band speech signal subjected to the noise suppression processing.

Furthermore, the band division noise suppression method according to thepresent invention having: a band division step of performing banddivision on an input speech signal into a low band speech signalincluding a low frequency noise component and a high band speech signalincluding a high frequency noise component; a decimation processing stepof performing down-sampling and decimation processing on the low bandspeech signal; a low band noise suppression step of suppressing noiseincluded in the low band speech signal subjected to the decimationprocessing; an interpolation processing step of performing up-samplingand interpolation processing on the noise-suppressed low band speechsignal; a high band noise suppression step of suppressing noise includedin the high band speech signal; and a band combination step of combiningthe low band speech signal subjected to the interpolation processing andthe high band speech signal subjected to the noise suppressionprocessing.

ADVANTAGEOUS EFFECT OF THE INVENTION

According to the present invention, input speech signal is divided intothe low band signal and the high band signal, and decimation processingis performed on the low band signal, so that it is possible to reducethe discrete Fourier transform length used in low band noise suppressionprocessing without decreasing extraction accuracy of a pitch harmonicpower spectrum. Furthermore, a simpler noise suppression processingtechnique than low band noise suppression processing, is applied to thehigh band signal. Therefore, it is possible to provide a band divisionnoise suppression apparatus and band division noise suppression methodhaving little distortion and a large amount of noise suppression with asmall amount of processing.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram showing a configuration of a band divisionnoise suppression apparatus according to an embodiment of the presentinvention;

FIG. 2 is a block diagram showing a configuration example of the lowband noise suppression section shown in FIG. 1;

FIG. 3 is a block diagram showing a configuration example of the highband noise suppression section shown in FIG. 1; and

FIG. 4 is a spectrogram illustrating the operation in a material elementof the low band noise suppression section shown in FIG. 2.

BEST MODE FOR CARRYING OUT THE INVENTION

Embodiments of the present invention will be described in detail belowwith reference to the accompanying drawings.

Embodiment 1

FIG. 1 is a block diagram showing a configuration of the band divisionnoise suppression apparatus according to an embodiment of the presentinvention. In FIG. 1, band division noise suppression apparatus 100according to this embodiment has: band division section 101; decimationprocessing section 102; low band noise suppression section 103;interpolation processing section 104; high band noise suppressionsection 105; and band combination section 106.

Furthermore, FIG. 2 is a block diagram showing a configuration exampleof low band noise suppression section 103 shown in FIG. 1. Low bandnoise suppression section 103 shown in FIG. 2 has: windowing section201; FFT section 202; low band noise base estimation section 203;band-specific voiced/noise detection section 204; pitch harmonicstructure extraction section 205; voicedness determination section 206;pitch frequency estimation section 207; pitch harmonic structurerepairing section 208; band-specific voiced/noise correction section209; subtraction/attenuation coefficient calculation section 210; lowband multiplication section 211; and IFFT section 212.

Furthermore, FIG. 3 is a block diagram showing a configuration exampleof high band noise suppression section 105 shown in FIG. 1. High bandnoise suppression section 105 shown in FIG. 3 has: high band noise baseestimation section 301; SN ratio estimation section 302; speech/noiseframe determination section 303; suppression coefficient calculationsection 304; suppression coefficient adjustment section 305; suppressioncoefficient averaging processing section 306; and high bandmultiplication section 307.

Next, noise suppression operation performed in band division noisesuppression apparatus 100 configured as described above will beexplained with reference to FIGS. 1 to 4. In addition, FIG. 4 is aspectrogram illustrating the operation in a material element of low bandnoise suppression section 103 shown in FIG. 2.

In FIG. 1, band division section 101 divides an input speech signalincluding noise into a speech signal including a low frequency noisecomponent (hereinafter referred to as “a low band speech signal”) S_(L)and a speech signal including a high frequency noise component(hereinafter referred to as “a high band speech signal”) S_(H) using anFIR (Finite Impulse Response) type or IIR (Infinite Impulse Response)type lowpass filter and highpass filter.

The divided low speech signal S_(L) is subjected to noise suppressionprocessing via a route of decimation processing section 102, low bandnoise suppression section 103 and interpolation processing section 104,and inputted to band combination section 106. On the other hand, thedivided high speech signal S_(H) is subjected to noise suppressionprocessing at high band noise suppression section 105, and inputted toband combination section 106. Band combination section 106 performs bandcombination processing on the noise-suppressed low band and high bandspeech signals, and outputs a full band speech signal in which a noisecomponent is suppressed to a low level, as an output of band divisionnoise suppression apparatus 100.

First, noise suppression processing of low band speech signal S_(L)performed through decimation processing section 102, low band noisesuppression section 103 and interpolation processing section 104 will bedescribed.

Decimation processing section 102 performs down-sampling on low bandspeech signal S_(L) to be inputted, generates decimated low band speechsignal S_(D) and provides the result to low band noise suppressionsection 103. At decimation processing section 102, for example, usingequation (1) below, half down-sampling is performed on low band speechsignal S_(L)(i), and generates a decimated low band speech signalS_(D)(i).

[Equation 1]

S _(D)(i)=S _(L)(2·i)  (1)

Low band noise suppression section 103 performs noise suppressionprocessing on the decimated low band speech signal S_(D) and providesthe processing result to interpolation processing section 104. There arevarious low band noise suppression processing methods, but here, a noisesuppression processing method shown in Patent Document 3 will bedescribed as one example. FIG. 2 is configured so that the noisesuppression method shown in Patent Document 3 is performed. The noisesuppression method will be described with reference to FIG. 2 and FIG.4.

In FIG. 2, windowing section 201 separates low band speech signal S_(D)inputted from decimation processing section 102 into predetermined timeunits (frames), performs windowing processing using the Hanning windowor the like, and outputs the result to FFT section 202.

FFT section 202 performs FFT (Fast Fourier Transform) processing on thespeech signal of frame units inputted from windowing section 201 andtransforms the speech signal on the time axis into the signal on thefrequency axis (speech power spectrum). In this way, the speech signalof frame units becomes a speech power spectrum having a predeterminedfrequency band. The generated speech power spectrum is inputted to lowband noise base estimation section 203, band-specific voiced/noisedetection section 204, pitch harmonic structure extraction section 205,voicedness determination section 206, subtraction/attenuationcoefficient calculation section 210 and low band multiplication section211.

Speech power spectrum S_(F)(k) in frequency component k acquired at FFTsection 202 is expressed in next equation (2) below.

[Equation 2]

S _(F)(k)=√{square root over (Re{D _(F)(k)}² +Im{D _(F)(k)}²)}{squareroot over (Re{D _(F)(k)}² +Im{D _(F)(k)}²)}1≦k≦HB/2  (2)

In equation (2), k is a number which specifies a frequency component. HBis an FFT transform length, that is, the number of data on which fastFourier transform is performed. For example, HB=256. Furthermore, Re{D_(F)(k)} and Im{D_(F)(k)} indicate respectively the real part and theimaginary part of FFT transformed speech power spectrum D_(F)(k).

First, low band noise base estimation section 203 applies inputtedspeech power spectrum S_(F)(k) to equation (3) below and estimates afrequency amplitude spectrum of a signal including only the noisecomponent, that is, noise base N_(B)(n,k).

$\begin{matrix}\left\lbrack {{Equation}\mspace{20mu} 3} \right\rbrack & \; \\{{N_{B}\left( {n,k} \right)} = \left\{ {{\begin{matrix}{N_{B}\left( {{n - 1},k} \right)} & {{S_{F}(k)} > {\Theta_{B} \cdot {N_{B}\left( {{n - 1},k} \right)}}} \\{{\left( {1 - \alpha} \right) \cdot {N_{B}\left( {{n - 1},k} \right)}} + {\alpha \cdot {S_{F}(k)}}} & {{S_{F}(k)} \leq {\Theta_{B} \cdot {N_{B}\left( {{n - 1},k} \right)}}}\end{matrix}\mspace{20mu} 1} \leq k \leq {{HB}/2}} \right.} & (3)\end{matrix}$

In equation (3), n is a frame number. N_(B)(n−1,k) is an estimated valueof noise base in an anterior frame. α is a noise base moving averagecoefficient. Furthermore, Θ_(B) is a threshold value for distinguishingbetween speech component and noise component.

Then, low band noise base estimation section 203 compares a speech powerspectrum generated from the latest frame from FFT section 202 and noisebase that estimates a speech power spectrum generated from a framebefore the latest frame in each frequency component in frequency band ofthe speech power spectrum. As a result of comparison, if the powerdifference between two exceeds the threshold value set in advance, thelatest frame is determined to include speech component, and noise baseestimation is not performed. On the other hand, if the difference doesnot exceed the above threshold value, the latest frame is determined notto include speech component, and noise base is updated.

In this way, the estimated noise base is inputted to band-specificvoiced/noise detection section 204, pitch harmonic structure extractionsection 205, voicedness determination section 206, pitch frequencyestimation section 207 and subtraction/attenuation coefficientcalculation section 210.

Next, band-specific voiced/noise detection section 204 applies speechpower spectrum S_(F)(k) from FFT section 202 and noise base estimatevalue N_(B)(n,k) from low band noise base estimation section 203 toequation (4) below and detects voiced band and noise band in speechpower spectrum S_(F)(k). Detection result S_(N)(k) is inputted toband-specific voiced/noise correction section 209.

$\begin{matrix}\left\lbrack {{Equation}\mspace{20mu} 4} \right\rbrack & \; \\{{S_{N}(k)} = \left\{ {{\begin{matrix}{{S_{F}(k)} - {\gamma_{1} \cdot {N_{B}\left( {n,k} \right)}}} & {{S_{F}(k)} > {\gamma_{1} \cdot {N_{B}\left( {n,k} \right)}}} \\0 & {{S_{F}(k)} \leq {\gamma_{1} \cdot {N_{B}\left( {n,k} \right)}}}\end{matrix}1} \leq k \leq {{HB}/2}} \right.} & (4)\end{matrix}$

As shown in equation (4), difference between speech power spectrumS_(F)(k) and noise base estimate value N_(B)(n,k) multiplied by constantγ₁ is calculated, and if the result is equal to or greater than zero,the band is determined to be voiced band including speech, otherwise,the band is determined to be noise band not including speech. FIG. 4 (A)is one example of detection result S_(N)(k) of voiced band and noiseband determined and detected using equation (4).

Next, pitch harmonic structure extraction section 205 applies speechpower spectrum S_(F)(k) inputted from FFT section 202 and noise baseestimate value N_(B)(n,k) inputted from low band noise base estimationsection 203 to equation (5) below and extracts pitch harmonic powerspectrum H_(M)(k) and outputs extraction result H_(M)(k) to voicednessdetermination section 206 and pitch harmonic structure repairing section208.

$\begin{matrix}\left\lbrack {{Equation}\mspace{20mu} 5} \right\rbrack & \; \\{{H_{M}(k)} = \left\{ {{\begin{matrix}{{S_{F}(k)} - {\gamma_{2} \cdot {N_{B}\left( {n,k} \right)}}} & {{S_{F}(k)} > {\gamma_{2} \cdot {N_{B}\left( {n,k} \right)}}} \\0 & {{S_{F}(k)} \leq {\gamma_{2} \cdot {N_{B}\left( {n,k} \right)}}}\end{matrix}1} \leq k \leq {{HB}/2}} \right.} & (5)\end{matrix}$

As shown in equation (5), difference between speech power spectrumS_(F)(k) and noise base estimate value N_(B)(n,k) multiplied by constantγ₂ (γ₂>γ₁) is calculated and if the result is equal to or greater thanzero, the band is determined to include pitch harmonic power spectrumH_(M)(k), otherwise, the band is determined not to include pitchharmonic power spectrum H_(M)(k). FIG. 4 (B) is one example of theextraction result of pitch harmonic power spectrum H_(M)(k) extractedusing equation (5).

Next, voicedness determination section 206 determines voicedness ofspeech power spectrum S_(F)(k) based on noise base estimate valueN_(B)(n,k) inputted from low band noise base estimation section 203 andthe extraction result of a pitch harmonic power spectrum inputted frompitch harmonic structure extraction section 205, and outputs thedetermination result to pitch frequency estimation section 207 and pitchharmonic structure repairing section 208.

Specifically, voicedness determination section 206, for example,calculates a ratio between the sum of pitch harmonic power spectrumH_(M)(k) and the sum of noise base estimate value N_(B)(n,k) atpredetermined frequency band using equation (6) and determines thedegree of voicedness based on the result. At pitch frequency estimationsection 207 and pitch harmonic structure repairing section 208 whichreceive the determination result, when the degree of voicedness isdetermined to be high, pitch frequency estimation and pitch harmonicstructure repairing are performed, and when the degree of viocedness isdetermined to be low, pitch frequency estimation and pitch harmonicstructure repairing are not performed. In equation (6), HP is a higherlimit frequency component in predetermined frequency band.

$\begin{matrix}\left\lbrack {{Equation}\mspace{20mu} 6} \right\rbrack & \; \\{V_{S} = {\sum\limits_{k = 1}^{HP}\; {{H_{M}(k)}/{\sum\limits_{k}^{HP}\; {N_{B}\left( {n,k} \right)}}}}} & (6)\end{matrix}$

Next, pitch frequency estimation section 207 estimates pitch frequencybased on speech power spectrum S_(F)(k) inputted from FFT section 202,noise base estimate value N_(B)(n,k) inputted from low band noise baseestimation section 203 and the voicedness determination result inputtedfrom voicedness determination section 206. At this time, as a result ofdetermination by voicedness determination section 206, if the voicednessof the speech power spectrum is equal to or lower than the predeterminedlevel, pitch frequency estimation is avoided. The estimation result isinputted to pitch harmonic structure repairing section 208. There arevarious methods in pitch frequency estimation, but, for example,autocorrelation method by autocorrelation function of a speech waveformand deformation correlation method by autocorrelation function of aresidual signal of LPC analysis, can be used.

Next, pitch harmonic structure repairing section 208 repairs a pitchharmonic power spectrum based on the extraction result of the pitchharmonic power spectrum inputted from pitch harmonic structureextraction section 205, the voicedness determination result inputtedfrom voicedness determination section 206 and the pitch frequencyestimate value inputted from pitch frequency estimation section 207. Atthis time, as a result of determination by voicedness determinationsection 206, if the voicedness of the speech power spectrum is equal toor lower than the predetermined level, repairing of the pitch harmonicpower spectrum is avoided. The repaired pitch harmonic power spectrum isinputted to band-specific voiced/noise correction section 209.

At voicedness determination section 206, if the voicedness of the speechpower spectrum is determined to be high, pitch harmonic structurerepairing section 208 repairs a pitch harmonic power spectrum using, forexample, the following procedure.

That is, pitch harmonic structure repairing section 208, first, extractsa pitch harmonic peak at pitch harmonic power spectrum H_(M)(k). Forexample, as shown in FIG. 4(C), peaks P1 to P5 and P9 to P12 areextracted.

Next, pitch harmonic structure repairing section 208 calculatesintervals between the extracted peaks. When the calculated intervalexceeds a predetermined threshold value (for example, 1.5 times thepitch frequency), missing peaks (peaks P6, P7 and P8 shown in FIG. 4(D)) in pitch harmonic power spectrum H_(M)(k) are inserted based on theestimated pitch frequency m. In this way, pitch harmonic power spectrumH_(M) (k) is repaired.

Next, band-specific voiced/noise correction section 209 combines therepairing result inputted from pitch harmonic structure repairingsection 208 and the detection result inputted from band-specificvoiced/noise detection section 204, corrects the band-specificvoiced/noise detection result, and outputs the correction result tosubtraction/attenuation coefficient calculation section 210.

Specifically, band-specific voiced/noise correction section 209 comparesthe pitch harmonic structure repairing result shown in FIG. 4(D) and theband-specific voiced/noise detection result S_(N)(k) shown in FIG. 4(A). Then band overlapped with the pitch harmonic structure repairingresult is regarded as voiced band, and the rest of the band is regardedas noise band. Band-specific voiced/noise correction section 209corrects band-specific voiced/noise detection result S_(N)(k) atband-specific voiced/noise detection section 204. FIG. 4(E) is oneexample of a result of correcting the band-specific voiced/noisedetection result shown in FIG. 4(A).

As shown in FIG. 4 (E), band-specific voiced/noise correction section209 regards a part overlapped with the repaired pitch harmonic powerspectrum H_(M)(k) as voiced band, and a part not overlapped with therepaired pitch harmonic power spectrum H_(M)(k) as noise band. In thisway, detection result S_(N)(k) is corrected.

Next, subtraction/attenuation coefficient calculation section 210calculates a subtraction/attenuation coefficient based on speech powerspectrum S_(F)(k) inputted from FFT section 202, noise base estimatevalue N_(B)(n,k) inputted from low band noise base estimation section203 and the correction result inputted from band-specific voiced/noisecorrection section 209, and outputs the result to multiplication section211.

Specifically, subtraction/attenuation coefficient calculation section210 calculates subtraction/attenuation coefficient G_(C)(k) for bothvoiced band and noise band in the corrected detection result S_(N)(k)based on speech power spectrum S_(F)(k) and noise base N_(B)(n,k) usingequation (7) below. In equation (7), μ is a constant. Furthermore, g_(c)is a predetermined constant which is greater than zero and smaller than1.

$\begin{matrix}\left\lbrack {{Equation}\mspace{20mu} 7} \right\rbrack & \; \\{{G_{C}(k)} = \left\{ {{\begin{matrix}{{{{S_{F}(k)} - {\mu \cdot {N_{B}\left( {n,k} \right)}}}}/{S_{F}(k)}} & {speechband} \\g_{C} & {noiseband}\end{matrix}1} \leq k \leq {{HB}/2}} \right.} & (7)\end{matrix}$

Next, low band multiplication section 211 multiplies voiced band andnoise band of the speech power spectrum inputted from FFT section 202 bythe subtraction/attenuation coefficient inputted fromsubtraction/attenuation coefficient calculation section 210. By thismeans, a speech power spectrum in which the noise component in the lowband speech signal is suppressed, is obtained. This multiplicationresult is inputted to IFFT section 212.

IFFT section 212 performs IFFT (Inverse Fast Fourier Transform)processing on the noise-suppressed speech power spectrum inputted fromlow band multiplication section 211. By this means, low band speechsignal S_(E) on time axis is generated from the speech power spectrum inwhich the noise component is suppressed. Generated low band speechsignal S_(E) is inputted to interpolation processing section 104.

Interpolation processing section 104 performs interpolation processingby, for example, double up-sampling on noise-suppressed low band speechsignal S_(E)(i), generates noise-suppressed low band speech signalS_(I)(i), and provides the result to one input end of band combinationsection 106.

$\begin{matrix}\left\lbrack {{Equation}\mspace{20mu} 8} \right\rbrack & \; \\{{S_{I}(i)} = \left\{ \begin{matrix}{S_{E}\left( {i/2} \right)} & {{i = 0},{\pm 2},{\pm 4},{\pm 6},\ldots} \\0 & {others}\end{matrix} \right.} & (8)\end{matrix}$

Next, the operation of high band noise suppression section 105performing noise suppression processing on divided high band speechsignal S_(H) will be described with reference to FIG. 3. In FIG. 3,divided high band speech signal S_(H) is inputted to high band noisebase estimation section 301, SN ratio estimation section 302,speech/noise frame determination section 303, suppression coefficientcalculation section 304 and high band multiplication section 307.

High band noise base estimation section 301 estimates noise signal powerincluded in inputted high band speech signal S_(H) using equations (9)and (10) below, and outputs the estimation result together with highband speech signal S_(H) to SN ratio estimation section 302,speech/noise frame determination section 303, and suppressioncoefficient calculation section 304.

That is, high band noise base estimation section 301 first calculatesaddition value S(n) of high band speech signal power using equation (9)below.

$\begin{matrix}\left\lbrack {{Equation}\mspace{20mu} 9} \right\rbrack & \; \\{{S(n)} = {\sum\limits_{i = 1}^{F_{L}}\; {S_{H}(i)}}} & (9)\end{matrix}$

In equation (9), n is a frame number, and F_(L) is a frame length.

Then, high band noise base estimation section 301 estimates high bandnoise base N(n) using equation (10) below.

$\begin{matrix}\left\lbrack {{Equation}\mspace{20mu} 10} \right\rbrack & \; \\{{N(n)} = \left\{ \begin{matrix}{N\left( {n - 1} \right)} & {{S(n)} > {\Theta \cdot {N\left( {n - 1} \right)}}} \\{{\left( {1 - \beta} \right) \cdot {N\left( {n - 1} \right)}} + {\beta \cdot {S(n)}}} & {{S(n)} \leq {\Theta \cdot {N\left( {n - 1} \right)}}}\end{matrix} \right.} & (10)\end{matrix}$

In equation (10), β is a moving average coefficient and Θ is a thresholdvalue for distinguishing between speech and noise.

Next, SN ratio estimation section 302 applies high band speech signalS_(H) and high band noise base estimate value N(n) to equation (11)below, estimates ratio SN(n) between speech signal power and noisesignal power at high band, and outputs the estimated ratio SN(n) tosuppression coefficient adjustment section 305.

[Equation 11]

SN(n)=(1−ρ)·SN(n−1)+ρ·S(n)/N(n)  (11)

In equation (11), ρ is a moving average coefficient.

Next, speech/noise frame determination section 303 applies high bandspeech signal S_(H) and high band noise base estimate value N(n) toequation (12) below, determines speech/noise frame SNF (n), and outputsthat determined speech/noise frame SNF(n) to suppression coefficientadjustment section 305.

$\begin{matrix}\left\lbrack {{Equation}\mspace{20mu} 12} \right\rbrack & \; \\{{{SNF}(n)} = \left\{ \begin{matrix}\left. {1\mspace{14mu} {speechframe}} \right) & {{{When}\mspace{14mu} {S(n)}} > {\Theta \cdot {N\left( {n - 1} \right)}}} \\{0\mspace{14mu} ({noiseframe})} & {{{When}\mspace{14mu} {S(n)}} \leq {{\Theta \cdot {N\left( {n - 1} \right)}}\mspace{20mu} {is}\mspace{14mu} {continued}\mspace{14mu} {for}\mspace{14mu} M\mspace{14mu} {frames}}}\end{matrix}\; \right.} & (12)\end{matrix}$

In equation (12), M is the number of hangover frames. As shown inequation (12), when S(n)>Θ·N(n−1), it is unconditionally determined thatSNF(n)=1(speech frame). On the other hand, when S(n)≦Θ·N(n−1), and thatS(n)≦ΘN(n−1) is continued for M frames, it is determined thatSNF(n)=0(noise frame), and when S(n)≦Θ·N(n−1) is not continued for Mframes, it is determined that SNF(n)=1(speech frame).

Next, suppression coefficient calculation section 304 applies high bandspeech signal S_(H) and high band noise base estimate value N(n) toequation (13), calculates suppression coefficient G_(H)(n) per frame,and outputs the calculated suppression coefficient G_(H)(n) per frame tosuppression coefficient adjustment section 305.

$\begin{matrix}\left\lbrack {{Equation}\mspace{20mu} 13} \right\rbrack & \; \\{{G_{H}(n)} = \frac{\lambda \cdot {S(n)}}{{S(n)} + {\kappa \cdot {N(n)}}}} & (13)\end{matrix}$

In equation (13), parameter λ is λ≦1, parameter κ is κ≧1, and both areadjustable.

Next, suppression coefficient adjustment section 305 adjusts parametersλ and κ of suppression coefficient G_(H) (n) based on the resultsinputted from SN ratio estimation section 302, speech/noise framedetermination section 303, and suppression coefficient calculationsection 304, and outputs the adjustment results to suppressioncoefficient averaging processing section 306.

Next, suppression coefficient adjustment section 305, specifically,performs adjustment of parameter κ shown in equation (13) based on theestimate value of the SN ratio. For example, when the SN ratio is large,the value of κ is made greater, and when the SN ratio is small, a valueof κ is made smaller. Furthermore, adjustment of parameter λ shown inequation (13) is performed based on the determination result ofspeech/noise frame. For example, a value of λ is assumed to be 1 in aspeech frame, and a value of λ is assumed to be smaller than 1 in anoise frame.

Next, suppression coefficient averaging processing section 306 performsaveraging processing of the suppression coefficient inputted fromsuppression adjustment section 305 using equation (14) below, andoutputs the obtained average value of the suppression coefficient tohigh band multiplication section 307.

$\begin{matrix}\left\lbrack {{Equation}\mspace{20mu} 14} \right\rbrack & \; \\{{\overset{\_}{G_{H}}(n)} = \left\{ \begin{matrix}{{\left( {1 - \eta_{F}} \right) \cdot {\overset{\_}{G_{H}}\left( {n - 1} \right)}} + {\eta_{F} \cdot {G_{H}(n)}}} & {{G_{H}(n)} > {\overset{\_}{G_{H}}(n)}} \\{{\left( {1 - \eta_{S}} \right) \cdot {\overset{\_}{G_{H}}\left( {n - 1} \right)}} + {\eta_{S} \cdot {G_{H}(n)}}} & {{G_{H}(n)} \leq {\overset{\_}{G_{H}}(n)}}\end{matrix} \right.} & (14)\end{matrix}$

In equation (14), η_(F) and η_(s) are transfer average coefficients, andthere is a relationship of 0<η_(s)≦η_(F)<1.

Then, high band multiplication section 307 multiplies high band speechsignal S_(H) and the average value of the suppression coefficient,generates noise-suppressed high band speech signal S_(J), and providesit to another input end of band combination section 106.

Thus, band combination section 106 combines speech signal S_(I)subjected to low-band noise suppression and speech signal S_(J)subjected to high-band noise suppression, and obtains an output of banddivision noise suppression apparatus 100. For example, first, to removean imaging component, band combination section 106 performs filtering onspeech signal S_(I) subjected to low-band noise suppression and speechsignal S_(J) subjected to high-band noise suppression using the samelowpass filter and highpass filter as those used in band division. Next,the filtering results are added per frame and outputted as an outputfrom band division noise suppression apparatus 100.

In this way, according to this embodiment, the input speech signal isdivided into speech signal including low frequency component and speechsignal including high frequency component, and decimation processing isperformed on the signal of low frequency where the power of the inputspeech signal is large, so that it is possible to perform more accuratenoise suppression processing with a small amount of calculation.Furthermore, a simpler noise suppression processing method than low bandnoise suppression processing is applied to the signal of high frequencywhere the power of the input speech signal is small, so that it ispossible to reduce speech distortion and remove noise adequately with asmaller amount of calculation.

At this time, in suppression processing of low band noise, first, voicedband and noise band are detected and a speech pitch harmonic powerspectrum buried in noise and missing is repaired based on the estimatedpitch frequency. Next, the determination result of voiced band and noiseband is corrected by combining the pitch harmonic power spectrum and thedetection results of voiced band and noise band, so that it is possibleto determine voiced band and noise band more accurately. As a result,subtraction processing with the small degree of attenuation andattenuation processing with the large degree of attenuation can berespectively performed on voiced band and noise band, so that it ispossible to perform noise suppression with little speech distortion evenif the amount of attenuation is made large.

Furthermore, in high band noise suppression processing, a noisesuppression coefficient and an average value thereof of signalcomponents of high band frequency are calculated, noise suppressionprocessing is performed in time domain, so that it is possible tosubstantially reduce the amount of calculation and the amount of memory.

Furthermore, in high band noise suppression processing, suppressioncoefficient calculation is performed based on an addition value ofspeech signal power of a high frequency and an estimate value of highband noise base, so that it is possible to calculate the suppressioncoefficient with a small amount of processing.

Furthermore, in high band noise suppression processing, high band noisesuppression is performed using the estimation result of the high band SNratio, so that it is possible to adjust the amount of high band noisesuppression according to changes in the SN ratio, and thereby improvenoise suppression performance between low band and high band.Furthermore, high band noise suppression is performed using the highband speech/noise frame determination result, so that it is possible tofurther reduce noise in the noise frame, and thereby substantiallysuppress high band noise which can be easily heard.

Still further, in high band noise suppression processing, averagingprocessing of suppression coefficients is performed, so that it ispossible to improve continuity between frames and obtain noisesuppression performance with high speech quality.

The present application is based on Japanese Patent Application No.2005-014772, filed on Jan. 21, 2005, the entire content of which isexpressly incorporated by reference herein.

INDUSTRIAL APPLICABILITY

The present invention is useful as a noise suppression apparatus thatcan reduce speech distortion and remove noise adequately with a smallamount of calculation, and in particular, is suitable for use in mobiletelephones.

1. A band division noise suppression apparatus comprising: a banddivision section that performs band division on an input speech signalinto a low band speech signal including a low frequency noise componentand a high band speech signal including a high frequency noisecomponent; a decimation processing section that performs down-samplingand decimation processing on the low band speech signal; a low bandnoise suppression section that suppresses noise included in the low bandspeech signal subjected to the decimation processing; an interpolationprocessing section that performs up-sampling and interpolationprocessing on the noise-suppressed low band speech signal; a high bandnoise suppression section that suppresses noise included in the highband speech signal; and a band combination section that combines the lowband speech signal subjected to the interpolation processing and thehigh band speech signal subjected to the noise suppression processing.2. The band division noise suppression apparatus according to claim 1,wherein the low band noise suppression section comprises: a low bandnoise base estimation section that estimates noise base comprising anoise component spectrum from a low band speech power spectrum; avoiced/noise detection section that detects a voiced band and a noiseband from the speech power spectrum using the speech power spectrum andthe noise base; a pitch harmonic structure extraction section thatextracts a pitch harmonic power spectrum from the speech power spectrumusing the speech power spectrum and the noise base; a pitch frequencyestimation section that estimates a pitch frequency in the speech powerspectrum using the speech power spectrum and the noise base; a pitchharmonic structure repairing section that repairs the extracted pitchharmonic power spectrum using the estimated pitch frequency; avoiced/noise correction section that corrects the detected voiced bandand noise band using the repaired pitch harmonic power spectrum; asubtraction/attenuation coefficient calculation section that calculatesa subtraction/attenuation coefficient for performing subtraction andattenuation on the voiced band and noise band corrected using the speechpower spectrum and the noise base; and a reconstruction section thatmultiplies the low band speech power spectrum by thesubtraction/attenuation coefficient, and reconstructs a speech powerspectrum in which a noise component is suppressed.
 3. The band divisionnoise suppression apparatus according to claim 1, wherein the high bandnoise suppression section comprises: a suppression coefficientcalculation section that calculates a suppression coefficient indicatinga degree of noise suppression in a predetermined time unit; asuppression coefficient adjustment section that adjusts a parameter ofthe calculated suppression coefficient; and an averaging processingsection that performs averaging processing of the adjusted suppressioncoefficient.
 4. The band division noise suppression apparatus accordingto claim 3, further comprising a high band noise base estimation sectionthat estimates a high band noise base comprising a noise component basedon a power addition value of the high band speech signal in thepredetermined time unit, wherein the suppression coefficient calculationsection calculates a suppression coefficient based on the power additionvalue of the high band speech signal and the high band noise baseestimate value.
 5. The band division noise suppression apparatusaccording to claim 3, comprising: an SN ratio estimation section thatestimates an SN ratio comprising a ratio between speech signal power andnoise signal power in the predetermined time unit; and a speech/noiseframe determination section that determines a speech frame and a noiseframe based on the high band speech signal and the high band noise base,wherein the suppression coefficient adjustment section adjusts aparameter of a suppression coefficient based on the estimated SN ratioand the determined speech frame and noise frame.
 6. The band divisionnoise suppression apparatus according to claim 3, wherein the averagingprocessing section performs averaging processing on the obtainedsuppression coefficient, and performs noise suppression processing on ahigh band speech signal in a predetermined time unit using the averagingprocessing result.
 7. A band division noise suppression methodcomprising: a band division step of performing band division on an inputspeech signal into a low band speech signal including a low frequencynoise component and a high band speech signal including a high frequencynoise component; a decimation processing step of performingdown-sampling and decimation processing on the low band speech signal; alow band noise suppression step of suppressing noise included in the lowband speech signal subjected to the decimation processing; aninterpolation processing step of performing up-sampling andinterpolation processing on the noise-suppressed low band speech signal;a high band noise suppression step of suppressing noise included in thehigh band speech signal; and a band combination step of combining thelow band speech signal subjected to the interpolation processing and thehigh band speech signal subjected to the noise suppression processing.8. The band division noise suppression method according to claim 7,wherein the low band noise suppression step comprises the steps of:estimating a noise base comprising a noise component spectrum from a lowband speech power spectrum; detecting voiced band and noise band fromthe speech power spectrum using the speech power spectrum and the noisebase; extracting a pitch harmonic power spectrum from the speech powerspectrum using the speech power spectrum and the noise base; estimatinga pitch frequency in the speech power spectrum using the speech powerspectrum and the noise base; repairing the extracted pitch harmonicpower spectrum using the estimated pitch frequency; correcting thedetected voiced band and noise band using the repaired pitch harmonicpower spectrum; calculating a subtraction/attenuation coefficient forperforming subtraction and attenuation on the voiced band and noise bandcorrected using the speech power spectrum and the noise base; andreconstructing a speech power spectrum in which a noise component issuppressed by multiplying the low band speech power spectrum by thesubtraction/attenuation coefficient.
 9. The band division noisesuppression method according to claim 7, wherein the high band noisesuppression step comprises the steps of: estimating high band noise basecomprising a noise component based on a power addition value of the highband speech signal in a predetermined time unit; estimating an SN ratiocomprising a ratio between speech signal power and noise signal power;determining a speech frame and a noise frame based on the high bandspeech signal and the high band noise base; calculating a suppressioncoefficient indicating a degree of noise suppression based on the poweraddition value of the high band speech signal and the high band noisebase estimate value; adjusting a parameter of the calculated suppressioncoefficient based on the estimated SN ratio and the determined speechframe and noise frame; and performing averaging processing of theadjusted suppression coefficient and performing suppression processingon the high band speech signal in a predetermined time unit using theaverage processing result.